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ABSTRACT 

Since gamma-ray burst afterglows were first detected in 1997, the relativistic 
fireball model has emerged as the leading theoretical explanation of the afterglows. In 
this paper, we present a very general, Bayesian inference formalism with which this, 
or any other, afterglow model can be tested, and with which the parameter values 
of acceptable models can be constrained, given the available photometry. However, 
before model comparison or parameter estimation can be attempted, one must also 
consider the physical processes that affect the afterglow as it propagates along the line 
of sight from the burst source to the observer. Namely, how does extinction by dust, 
both in the host galaxy and in our galaxy, and absorption by the Lya forest and by H 
I in the host galaxy, change the intrinsic spectrum of the afterglow? Consequently, we 
also present in this paper a very general, eight-parameter dust extinction curve model, 
and a two-parameter model of the Lya forest flux deficit versus redshift distribution. 
Using fitted extinction curves from Milky Way and Magellanic Cloud lines of sight, 
and measurements of Lya forest flux deficits from quasar absorption line systems, 
we construct a Bayesian prior probability distribution that weights this additional, 
but necessary, parameter space such that the volume of the solution space is reduced 
significantly, a priori. Finally, we discuss the broad applicability of these results 
to the modeling of light from all other extragalactic point sources, such as Type la 
super novae. 
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1. Introduction 

Optical afterglows have been detected for at least twelve gamma-ray bursts (GRBs); 
underlying galaxies have been detected for at least seven of these. Underlying galaxies have been 
detected by high-resolution imaging with HST [Sahu et al. 1997; Pruchter et al. 1999a (GRB 
970228); Pruchter et al. 1999b (GRB 970508); Kulkarni ct al. 1998 (GRB 971214); Fruchtcr 1999b, 
private communication (GRB 980329); Bloom et al. 1999a; Fruchtcr et al. 1999c (GRB 990123); 
Fruchter 1999c, private communication (GRB 990712)], by medium-resolution, ground-based 
imaging [e.g., Djorgovski et al. 1998a,b (GRB 980613)], by detecting emission lines at afterglow 
locations [e.g., Djorgovski et al. 1998c (GRB 980613); Djorgovski et al. 1998d (980703)], and by 
sampling afterglow light curves until an asymptotic value is approached [e.g.. Bloom et al. 1998 
(GRB 980703)]. However, this last method is not always reliable, as Bloom et al. (1999b) have 
shown that a brightening supernova component to an afterglow light curve can be misinterpreted 
as being due to an underlying galaxy if the light curve is not sufficiently well-sampled at late 
times (see also Hjorth et al. 1999). Lamb (1999) has shown that underlying galaxies that have 
been confirmed to be coincident with their afterglows by high-resolution, HST imaging are host 
galaxies to a high degree of certainty; however, 10 — 15 % of the remaining underlying galaxies 
are probably chance coincidences. Consequently, at least six of these underlying galaxies are host 
galaxies, and the remaining one or two underlying galaxies are very likely to be host galaxies as 
well. 

Since many, if not all, of the long bursts with detected optical afterglows are associated with 
host galaxies, these afterglows are likely to be extinguished by dust in their host galaxies (Reichart 
1997), as well as by dust in our galaxy, and absorbed by H I in their host galaxies, as well as by the 
Lya forest (Pruchter 1999a). These physical processes affect - in some cases, probably significantly 
- the observed spectra of afterglows from the infrared (IR) through the ultraviolet (UV). For 
example. Lamb & Reichart (2000a, b) suggest that some of the ~ 13 bursts with securely-detected 
X-ray afterglows, but without securely-detected optical afterglows, might be explained by large 
amounts of extinction by dust in their host galaxies, probably from the immediate vicinities of 
these bursts if they are indeed associated with star-forming regions (see Lamb & Reichart 2000a 
for a discussion of the evidence in favor of this association), or by absorption by the Lycc forest if 
these bursts occur at very high redshifts (2; ;^ 5). 

Since the majority of afterglow observations are made at optical and near- infrared wavelengths, 
the effects of these physical processes on the observed spectra cannot be ignored, particularly since 
these spectra have been redshifted. Indeed, these effects must be carefully modeled if intrinsic 
spectra arc to be recovered. This is the primary purpose of this paper. The secondary purpose of 
this paper is to present a very general, Bayesian inference formalism with which afterglow models 
can be tested, and with which the parameter values of acceptable models can be constrained, 
given the available photometry. We begin with this in §2. Also in §2, we develop and present a 
formalism for the construction of Bayesian prior probability distributions from multi-dimensional 
data sets, which we draw on extensively in §4 and §5. In §3, we present an eight-parameter dust 
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extinction curve model, based on the work of Fitzpatrick & Massa (1988) and Cardelli, Clayton, & 
Mathis (1989). In §4, we construct a prior that weights this additional parameter space such that 
the volume of the solution space is reduced significantly, a priori^ using fitted extinction curves 
from Milky Way and Magellanic Cloud lines of sight. In §5, we present a two-parameter model 
of the Lya forest flux deficit versus redshift distribution, and we construct an analogous prior 
using Lya forest flux deficit measurements from quasar absorption line systems. In §6, we present 
a wide variety of extinguished and absorbed spectral flux distributions, using these models. In 
§7, we draw conclusions, including a discussion of the broad applicability of these results to the 
modeling of light from all other extragalactic point sources. 



2. Statistical Methodology 

In §2.1, we present a very general, Baycsian inference formalism with which afterglow models 
(and any other model for that matter) can be tested, and with which the parameter values 
of acceptable models can be constrained, given the available photometry. In §2.2, we develop 
and present a formalism for the construction of Bayesian prior probability distributions from 
multi-dimensional data sets, which we draw on extensively in §4 and §5. For a deeper discussion 
of Bayesian inference, we refer the reader to an excellent review by Loredo (1992). 



2.1. Bayesian Inference 

2.1.1. B ayes' Theorem 



Bayes' theorem states: 



where H is the hypothesis, or model, being considered, D is the data, and / is any available 
prior information. Hence, Bayes' theorem states that the probability of a given hypothesis, 
p{H\DI), given the data and any available prior information, is proportional to the product of the 
probability of the hypothesis, p{H\I), given the prior information, and the probability of the data, 
p{D\HI), given the hypothesis and the prior information. The quantity p{H\DI) is called the 
posterior probability distribution, the quantity p{H\I) is called the prior probability distribution, 
and the quantity p{D\HI) - sometimes denoted C{H) - is called the likelihood function. 

The quantity p{D\I) normalizes the posterior. Let the hypothesis, or model, H, be described 
by a set of parameters 6. Then, Bayes' theorem reads: 



(2) 
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Normalization demands that 

' p{e\DI)de = 1; (3) 



hence, p{D\I) is given by 

p{D\i) = ( p{e\i)p{D\ei)de. (4) 

Je 

Consequently, given a prior (see §2.1.2) and a likelihood function (see §2.1.3), a normalized 
posterior may be computed. 



2.1.2. The Prior 

Let {0} denote the region over which the parameters 6 are integrated in Equations (^ and (^. 
The prior, p{9\I), describes how any available pre-existing information constrains the values of the 
parameters 6, or equivalently, how any available pre-existing information weights the parameter 
space {0}, and consequently, reduces the volume of the solution space, a priori. 

If no prior information is available, one usually takes the prior to be flat within a region 
{Sphys} C {6} where the values of the parameters 6 are considered to be physically plausible; the 
prior is taken to be zero everywhere else: 

here, the volume integral normalizes the priorj^ The flat prior weights, a priori, all physically- 
plausible solutions equally, and gives no weight to physically-implausible solutions. 

As an example, consider the case of a two-parameter model, where the parameters, x and y, 
are physically unrelated. In this case, the prior factorizes: 

p{x,y\I) =p{x\I)p{y\I). (6) 

Furthermore, suppose that prior information states that the possible values of x are normally 
distributed with a mean of a and a standard deviation of 6, but that no prior information is 
available on the value of y, other than that values of y < yi and y > yu are considered physically 
implausible. Then, the prior for the parameter x, given the prior information a and b, is given by 

p{x\a,b) = G{x, a,b), (7) 

where G{x, a, b) is a normalized Gaussian distribution, given by 



G{x, a, b) = -^=- exp 



1 / X — 



(8) 



^Here, we consider only linearly flat priors; however, logarithmically flat priors are also used, particularly when 
{dphys} spans orders of magnitude. 
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and the prior for the parameter y, given the prior information yi and y„, is given by 

P{y\yi,yu) = F{y,yi,yu), (9) 
where F{y,yi,yu) is a flat prior, given by 

Fiy,m,y.) = l^'--'^^" S'l.^ ' ^ • 
[ L) (otnerwisej 

We make extensive use of Gaussian and flat priors in this paper, particularly in §2.2. We 
present specific priors for parameters that describe the effects of extinction and absorption along 
the lines of sight to bursts in §4 and §5, respectively. 



2.1.3. The Likelihood Function 

The likelihood function, p{D\6I)., describes how any available data constrain the values of 
the parameters 9, or equivalently, how any available data weight the parameter space {0}, and 
consequently, reduce the volume of the solution space. Consequently, the posterior, p{9\DI)., 
which is proportional to the product of the prior and the likelihood function, describes how prior 
information and data jointly constrain the values of the parameters 0, or equivalently, how prior 
information and data jointly weight the parameter space {0}, and consequently, jointly reduce the 
volume of the solution space. 

We now consider the form of the likelihood function for an unspecified afterglow model 
(and for any other spectral and temporal model for that matter). Let Fy{v,t]9) be the model's 
prediction for the spectral flux of an afterglow; Fy{v, t; 6) is a function of frequency of observation, 
u, time of observation, t, and the model parameters, 0, which should include parameters that 
describe the effects of extinction and absorption along the line of sight. Given measured spectral 
fluxes, the likelihood function is given by 

N 

p{D\ei) = n G'„[F,K,t„;0),F,,„,ai.,,„], (11) 

n=l 

where F^^n is the nth measured spectral flux, (JF^,n is the measured l-cr uncertainty associated 
with this spectral flux, and Gn[Fy{vri-,tn'-,G))Fi,^n)CyFi,,n\ is a normalized Gaussian distribution, 
given by Equation (P). 



2.1. 4-. Model Comparison 



Model comparison allows one to asses the relative probability of two or more models; 
consequently, this procedure may be used to reject non-viable models. Here, we consider the case 



-6- 



of only two models; however, one can easily generalize the following procedure to the case of 
multiple models. 

Consider two models, Hq and H^, that are described by two sets of parameters, 6 and (f), 
respectively. The relative probability of model Hg to model is called the odds ratio, and is 
given by 

_ JeP{e\Di)de 



_ j,pi9\i)p{D\ei)d0 

j^PimpiDmdf 

Normalization demands that 



hence, given only models Hg and H^, the probability in favor of model Hg is 



and the probability in favor of model i?^ is 



(13) 



/ p{e\Di)de + / p{4>\Di)d4> = i- (u) 

Je Jd> 



p(cP\DI)dcP=-—^. (16) 
1 + Ue6 



2.1.5. Parameter Estimation 

Parameter estimation allows one to constrain parameter values of acceptable models. This 
procedure has two parts: marginalization and the determination of credible regions. 

Consider a single model, H, that is described by two sets of parameters: interesting 
parameters, 0, whose values one wishes to constrain, and uninteresting parameters, 0, whose 
values one does not need to constrain. Then, Bayes' theorem reads: 

(17) 

The posterior of the interesting parameters, p{9\DI), is given by integrating the full posterior, 
p{9(l)\DI), over the uninteresting parameters, (j), and by then normalizing the resulting distribution: 

' ' j,^p{e<p\i)p{Dm)d0dci^' ^ ' 



This procedure is called marginalization. 
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Crcdible regions are determined by integrating the posterior from the most probable region 
of {9} to the least probable region of {9} until p % of the distribution has been integrated: 



where {9p} C {9} such that p{9i\DI) > p{92\DI) for any 9i G {9p} and for any 6*2 e {9} - {9p}. 
The region {9p} is called the p % credible region of the parameters 9, or the solution space. Of 

course, one can imagine many regions - actually an infinite number of regions - that integration 
over yields p % of the distribution; however, by integrating over the most probable region of {9}, 
one guarantees that the volume of {9p} is minimal, and that {9p} is uniquely defined. 



In §3, we present an eight-parameter model that describes the effects of extinction by dust 
along the lines of sight to bursts (and along the lines of sight to all extragalactic point sources for 
that matter). However, without a prior that weights this parameter space such that the volume 
of the solution space is reduced significantly, a priori, this model has very little predictive power. 
Fortunately, a considerable amount of prior information - in the form of fitted values for six of 
these eight parameters from 166 measured Milky Way and Magellanic Cloud extinction curves, 
and fitted values for one of the two remaining parameters from 79 of these extinction curves - 
exists; we describe this multi-dimensional data set in §4. In this section, we develop and present 
a formalism by which priors can be extracted from multi-dimensional data sets; we draw on this 
formalism extensively in §4 and §5. The extraction of a simply-formulated prior from, for example, 
the above, large, multi-dimensional data set greatly facilitates the incorporation of this prior 
information in future afterglow analyses. We begin with a sequence of four illustrative examples in 
§2.2.1, the last three of which are particularly relevant to our construction of the dust extinction 
curve prior in §4. 



With the first example, we describe the form that the prior should take in the ideal case of 
one (or more) of the quantities in the data set being fully determined by other quantities in the 
data set, and of this relation between these quantities being either known or easily determined 
from the data set. With the second example, we describe the form that the prior should take in 
the less ideal case of this relation between these quantities existing, but of its existence not being 
known or easily determined from the data set, although correlations between subsets of these 
quantities are determinable from the data set. With the third example, we describe the form that 
the prior should take in the related case of this relation not being determinable from the data 
set because it involves quantities that are not in the data set, although correlations between the 




(19) 



2.2. Constructing Priors from Multi-Dimensional Data Sets 



2.2.1. Examples in Three Dimensions and their Generalization 
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quantities that are in the data set, or subsets of these quantities, are determinable from the data 
set. This last example is particularly realistic in that one often deals with physical processes, like 
dust extinction, that, although understood in general, many of the details of which depend on 
quantities whose relevance has not even been postulated yet, let alone whose values have been 
measured. With the final example, we describe the form that the prior should take in the event 
that data selection effects, either due to instrumental limitations or due to how the sample was 
selected at a more human level, artificially constrain the values of quantities in the data set. We 
then discuss our generalization of these examples into a procedure. 

Example 1. Consider a three-dimensional data set that consists of measured values of the 
parameters w, x, and y. Furthermore, suppose that these parameters are related hy y = x + w, 
and that w and x are physically-unrelated parameters whose measured values are distributed 
as the Gaussians G(u', 0,0.1) and G{x,0, 1), i.e., as to = ± 0.1 and a; = dr 1. Consequently, 
the measured values of the parameter y should also be distributed as a Gaussian, namely, 
G{y, 0, 1.005). In this case, the prior that best represents this data set is given by 

p{w, X, y\I) = G{w, 0, 0.1)G{x, 0, l)5{y -x-w). (20) 

This prior weights the three-dimensional parameter space, consequently reducing the volume of 
the solution space to a localized region of a two-dimensional plane, a priori. 

Example 2. Suppose now that the relation y = x + w exists, but that its existence has not yet 
been determined. One way to learn of this relation is to plot the data in three dimensions (or to 
plot the data in two dimensions and use perspective, or different symbols, or different colors, etc., 
to represent the third dimension). However, this approach is increasingly difficult to implement 
in increasingly-higher dimensions. Another approach is to probe the data set mathematically. 
However, without prior knowledge of the form of the relation, let alone knowledge of its existence, 
this approach also can fail, particularly if the relation is non-linear in form. Consequently, we 
now consider the form that the prior should take in the event that the relation y = x + w is not 
known. In this case, one approach is to plot the data two parameters at a time. Having done 
this, one would immediately notice that the parameters x and y are strongly correlated, though 
it is unlikely that one would notice the weaker correlation between the parameters w and y, since 
the values of the parameter w span a much smaller range than do the values of the parameter x. 
Finally, the parameters w and x also should appear to be uncorrelated, since we stated above that 
they are physically unrelated. From this information, one can construct the following prior: 

p{w, x, y\I) = G{w, 0, 0.1)G(x, 0, l)G(y, x, 0.1), (21) 

where the last two factors describe the distribution of the data in the x-y plane. Although this 
prior does not reduce the volume of the solution space as significantly as the above prior does, it 
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certainly reduces it more than would the prior that one would construct if no correlations were 
noticed, 

p{w, X, y\I) = G{w, 0, 0.1)G{x, 0, l)G(y, 0, 1.005), (22) 

and it certainly reduces the volume of the solution space significantly more than would the prior 
one would construct if the prior information were altogether ignored, i.e., if a flat prior were 
adopted (§2.1.2): 

p{w, X, y\I) = F{w, wi,Wu)F{x, xi,Xu)F{y, yi,yu)- (23) 

Here, wi < w < Wu, xi < x < Xu, and yi < y < yu define the ranges over which the values of these 
parameters are considered to be physically plausible. 

Example 3. Consider now the related case in which the parameter w either is not or cannot be 
measured, and in fact, the very relevance of the parameter to the physical process at hand might 
not even be known. In this case, the prior described by Equation (|2l| ) should be replaced by 

p{w,x,y\I) = F{w,wi,Wu)G{x,0,l)G{y,x, 0.1), (24) 

if w is one of the model parameters, or by 

p{x,y\I)=Gix,0,l)G{y,x,0.1), (25) 

if w is not one of the model parameters. 

Example 4. Finally, when constructing priors from data sets, one must be very careful that data 
selection effects do not bias the priors. For example, suppose that the measured distribution of 
the parameter values of x merely reflects how the data were sampled, and not how the parameter 
values of x are intrinsically distributed. In this case, the above prior should be replaced by 

p{x, y\I) = F{x, xi,Xu)G{y, x, 0.1). (26) 

However, in this case, the factor G{y, x, 0.1) is an extrapolation beyond the range of the measured 
values of the parameter x, and must be treated as such. 

When constructing a prior from a multi-dimensional data set in general, we adopt the 
following procedure: (1) we plot two- and sometimes three-dimensional subsets of the data to 
facilitate the identification of correlations between parameters; (2) if correlations are found, say 
between pairs of parameters, we determine the two-dimensional distributions that describe these 
subsets of the data; we also determine the one-dimensional distributions of the values of all of the 
parameters; and (3) we use this information to construct a prior for the full parameter space, as 
in the above examples, while being mindful of data selection effects. How to go about steps (1) 
and (3) should be clear; how to go about step (2) - the construction of two-dimensional priors, as 
well as one- and three-dimensional priors - we explain in §2.2.2 and §2.2.3. 
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2.2.2. Constructing Priors from Two-Dimensional Data Sets 

Suppose that two parameters, x and y, are correlated, i.e., that the measured values of these 
parameters are scattered about a curve, y = ydx; 9m), where dm are M parameters that describe 
this curve. The scatter of these points about this curve can be both due to measurement errors, in 
which case the scatter is referred to as intrinsic scatter, and due to weaker dependences of either 
of the parameters x or y on other, yet-unmcasurcd, and even yet-unknown parameters (e.g., the 
parameter w in Example 3 of §2.2.1), in which case the scatter is referred to as extrinsic scatter. 
Below, we take all of these scatters to be normally distributed and uncorrelated. Finally, let 
g{x,y)6{y — yc) be the intrinsic density of points along the curve y = ycix]0m), and let f{x,y) 
be the selection function, i.e., the efficiency at which given values of the parameters x and y are 
observed. We now construct a prior that describes the correlation between the parameters x and 

y- 

We model the intrinsic density of points in the x'-y' plane by convolving the intrinsic density 

of points along the curve y = yc{x;9m), i.e., g{x,y)6{y — yc), with the two-dimensional Gaussian 
smearing function G{x' ,x,ax)G{y' ,y,ay), the scale of which is parameterized by 1-a extrinsic 
scatters ax and ay-. 

Pint{x',y'\9m,crx,cry) = / / g{x,y)S{y - yc)G{x' ,x,ax)G{y' ,y,ay)dxdy (27) 

Jx Jy 

g{x, yc)G{x', x, ax)G{y', yc, (Jy)ds, (28) 

where ds = \J dx^ -|- dy^ is an element of path length along the length of the curve. The observed 
density of points in the x'-y' plane is then given by 

Pobs{x', y'\9m, (Tx,Cry) = f{x', y')pint{x', y'\9m, CTx, CTy) (29) 

f{x', y')g{x, yc)G{x', x, ax)G{y', yc, ay)ds. (30) 

The probability distribution of the nth data point, (xn,yn), given l-cr intrinsic scatters CFx,n 
and Gy^n, is given by 

Pn{x ,y \Xn^yn, 0'x^n^y,n) — Gn{x , Xn, (yx^n)Gn{y ,yn,^y,n}- (^-'■) 

Hence, the joint probability distribution of a given model and the nth data point is given by 

Pn{x , y \dm, ^x, ^y, Xn, yn, ^x,n'^y,n) ~ Pohs{x , y \9m, ^x, ^y)Pn{x , y \Xn, yn, CFx,n^y,n) (^2) 
f{x', y')g{x, yc)G{x', x, ax)G{y', yc, ay)Gn{x', Xn, crx,n)Gn{y' , yn, (Ty,n)ds. (33) 

The joint probability of a given model, i.e., given values of the parameters 9m, ctx, and ay, 
and the nth data point is given by integrating Pn{x',y'\9m,o'x,o'y,Xn,yn,'^x,n(^y,n) over x' and y': 

Pn{9m, ^x, ^y\Xn,yn, ^x,n^y,n) ~ 
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f{x', y')g{x, yc)G{x', x, ax)G{y', yc, ay)Gn{x', Xn, o-x,n)G„(y', y„, ay^n)dx' dy'ds. (34) 

IX' jy Js 

Finally, the joint probability of a given model and all of the data points is given by taking the 
product of the N probabilities Pn{dm, crx,cry\xn, yn, crx,ncry,n)- 



P^Grm fyl^jii yrii ^x,ni ^y,n) 



N 



n / / f{x',y')g{x,yc)G{x',x,ax)G{y',yc,ay)Gn{x',Xn,(Tx,n)Gn{y',yn,(Ty,n)dx'dy'ds. (35) 

This is the prior. In this form, it is db function of Ad -\- 2 pa,rani6ters: ^rri? a^, and ay. 

If the scale over which the selection function f{x',y') varies from constancy is larger than (1) 
the scale of the two-dimensional Gaussian G{x' , x, ax)G{y' ,yc, CTy), as measured by ax and ay, and 
(2) the scale of the two-dimensional Gaussian Gn{x' , Xn, ax,n)Gniy' ,yn,(^y,n), as measured by ax,n 
and ay^n, then the first two integrations of Equation ( |35|) can be done analytically: 



P(,Grm '^xi '^y\Xni yni ^x,ni ^y,n 

N 

n fn{xn,yn) j j i 9{x, yc)G{x' , X, ax)G{y' , yc, ay)Gn{x', Xn, ax,n)Gn{y', yn, ay^n)dx'dy'ds 

„ 1 J x' J v' J s 



n=l Jx'Jy'Js 



(36) 

N 

n /(^"' J yc)Gn [x, Xn, \J(yl + aln) Gn {ijc, yn, ds. (37) 



n=l 



The final integration, however, is non-trivial. It consists of a path integration through the 
product of two distributions: g{x,y)6{y — yc), the intrinsic density of points along the curve 
y = y{x;9m), and the two-dimensional Gaussian Gn (^x,Xn, Gn (y,yn, ^cr^ + <^y^n) ■ 

However, if the scale of this two-dimensional Gaussian, as measured by ^a'^ + a'^ n and \J (^y + o'y n, 
is smaller than (1) the scale over which ydx; 9m) varies from linearity, and (2) the scale over which 
g{x, yc) varies from constancy, this integration can be done with relative ease, as we now show. 

Let {xt^n,yt,n) be the point on the curve y = yc{x;9m) for which the value of the two- 
dimensional Gaussian, G„ {x,Xn, y^ci^ + j Gn {y,yn, ^J'^y ~^ '^y,n) , is maximum. At this point, 
the curve y — ydx] 9^%) will be tangent to an iso-contour of the two-dimensional Gaussian, i.e., the 
ellipse given by 

{x - Xn)"^ {y - j/n)^ ^ {Xt,n - Xn)"^ {yt,n - yn)"^ .„„x 

fj2 , + cj2 I (72+0-2 Cr2 + 0-2 ' ^ ^ 

^x ' ^x,n ^y ' y,n ' ^x,n ^y ' y,n 

By repeatedly setting the slope of this tangential ellipse equal to the slope of the curve, the 
tangent point, {xt^n,yt,n), can be found iteratively; if yc{x;9m) is indeed slowly varying, only a 
few iterations are required. Now, making using of the first assumption - that yc{x;9m) does not 
vary significantly from linearity over the scale of the two-dimensional Gaussian - one can replace 
yc{x',9m) in Equation ( |37|) with the following approximation: 

yc{x; 9ni) ^ yt,n + St,n{x - Xt,n), (39) 
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where 



St-, 



dx 



(40) 



is the slope of yc{x;9m) (or that of the tangential ellipse) at the tangent point {xt,n-,yt,n)- Finally, 
by making use of the second assumption - that g{x^ yc) does not vary significantly from constancy 
over the scale of the two-dimensional Gaussian - one can complete the integration of Equation 
( p7| ) analytically: 

pifimi '^xi '^ylXm yni ^x,ni ^y,n) ~ 

N 

n fn{Xn,yn)9 nyXn^yn) I Gn 



n=l 



TV 



n fn{Xn,yn)gn{Xn,yn)Jl + sl^jGn Vn^Vt^n + St,n{Xn " 3;t,n), a/o-^ + Cr2„ + 4n(c^x + '^l,n 



n=l 



(42) 



Normalization of the prior removes the need to determine the value of the constant 

Un=l fn{Xn,yn)gn{Xn,yny, henCC, 



p{Gmi ^xi ^y\Xni yni ^x,ni 



N 



n V 1 + ^InGn Vn, yt,n + St,n{Xn - Xt,n), A/ ^1 + ^In + + ^^x 



(43) 



n=l 



2.2.3. Constructing Practical Priors from Two-Dimensional Data Sets and its Generalization 



From the point of view of practicality, Equation (43) has a number of drawbacks. First of all, 
by formulating the prior in this way, we have replaced the two parameters x and y with M + 2, 
intermediate parameters: 6^, (Tx, and cry. Secondly, potential users of this prior must have access 
to the 4A^ pieces of prior information, Xn, yn, <^x,n-, and CFy^n, where A'' can be a very large number, 
that are required for its computation. Finally, the computation of this prior, although completely 
feasible, is non-trivial: the iterative procedure of finding the tangent point (§2.2.2) must be 
performed N times at every grid point in the (M + 2)-dimensional space that the prior spans. 



These problems can be overcome by instead taking Equation ( |43| ) to be a likelihood function, 
and by then applying the statistical methodology of §2.1.5 to constrain the values of the 
intermediate parameters 6mi and dy, i.e., to reduce the AN pieces of prior information to 
what we show below to be 2M + 2 representative values, where M is typically a few. Given these 
fitted values, it is a simple matter to construct an approximation to Equation ( |4^ ) (1) that is 
solely a function of the parameters x and y, (2) that requires only these 2M -|- 2 values as prior 
information, and (3) that is computationally non-taxing. We do this now; we then generalize these 
results to other dimensions. 



- 13 - 



Let 

^rrt) Slid d'y be the best-fit values of ax, and ay, and let ae^ be the fitted, l-a 
uncertainties in the values of 6m- If one takes these fitted values to be normally distributed and 
uncorrelated, Equation (42) may then be approximated by 

p{x,y\em,^e,^,^x,^y) ~ G[y,yc{x;ejn),<Jy{x;Om,^emi^x,^y)], (44) 

where 



M 



(^y{x;Om,^em^^x,^y) = 



^ m=l 



dyc{x;9„ 



■'m — i-'m 



dy c{x;6n 
dx 



1/2 



(45) 



-'m — I-'m 



Here, ay{x; 6m, ^dm^x, f^y) is the quadratic sum of the uncertainty in the curve y = yc{x;9m) due 
to the uncertainties ao^ in the best-fit values 6m, the uncertainty in the curve y = yc{x;6m) due 
to the extrinsic scatter ax in the x dimension, and the uncertainty in the curve y = ydx; 9m) due 
to the extrinsic scatter ay in the y dimension. Consequently, Equation ( |4^ ) (1) is indeed solely a 
function of the parameters x and y, (2) requires only 2il{f -t- 2 pieces of prior information, ^772, a^^, 
ax, and dy, and (3) is easily computed. 

Equation (^) can be improved upon if the selection function, f(x,y), is well understood. 
In this case, the intrinsic density of points along the curve y = yc{x;9m), i-e., g{x,yc), can be 
recovered from the observed density of points along the curve y = yc{x;6m), i-e., f{x,yc)g{x,yc), 
in which case. Equation (EJ) can be replaced with 



p{x, y\9m, c!e^,ax,ay) ^ g[x, y^x; 6m)]G[y, y^x; 9m),cry{x; 9m, ae^,ax,ay)]. 



(46) 



However, selection functions often are not well understood, as is the case with the data sets that 
we present in §4 and §5; consequently, we do not develop this case further in this paper. 

One-dimensional priors are trivially derived by setting st^n = in Equation (^3|). In this case, 
Equation ( |4^ ) reduces to: 

P(y|y> ^y) = G{y, y, ay). (47) 



Furthermore, it is not difficult to generalize Equations ( ^3|) and (^4[) to more than two 
dimensions. Let y = yc{xi\6m), where 1 < I < L. In this case, one can show that Equation (43) 
generalizes to 



N 
n=l \ 

where 



1=1 



P{6m, ) ^y\xi^n, Un, ^xi,n, ^y,n) 
L 

yn,yt,n + ^l,t,n{^l 

dyc{xi;6 



1=1 



\ 



1=1 



Sl,t,n 



dxi 



, (48) 
(49) 



Xl=Xl^t,n 
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and Equation (|4j) generalizes to 

p{xi ) ~ G[y, Ucixr, Om), CTyixi; 9m, ^9r^, ^xi, ^y)], 

where 



(50) 



AI 



cryixi;0m,cre,„cr^^,ay) = < ^ 



^ m=l 



dydxi; 



dxi 



1/2 



(51) 



3. The Dust Extinction Curve Model 

We now present an eight-parameter model that describes the effects of extinction by dust 
along lines of sight through our galaxy, and by redshifting this model, along lines of sight through 
burst host galaxies (and along lines of sight through the host galaxies of all extragalactic point 
sources for that matter). This model is a combination of the two-parameter, IR and optical 
extinction curve model of Cardelli, Clayton, & Mathis (1989), and the eight-parameter, UV 
extinction curve model of Fitzpatrick Sz Massa (1988). We present the IR and optical extinction 
curve model in §3.1; we present the UV extinction curve model in §3.2. In §3.3, we modify these 
models to include the effect at far-UV (FUV) wavelengths of absorption by H I in galaxies. 



3.1. A > 3000 A 

Using UBVRIJHKL photometry of 29 reddened Milky Way OB stars (Clayton k Mathis 
1988; Clayton &: Cardelli 1988), and UV extinction curves that had been fitted to International 
Ultraviolet Explorer (lUE) spectra of 45 Milky Way OB stars (Fitzpatrick & Massa 1988; see 
§3.2), Cardelli, Clayton, & Mathis (1988, 1989) constructed an empirical, two-parameter, IR 
through FUV extinction curve model. The two parameters are Ay and Ry- The former parameter 
normalizes the extinction curve at the V band; the latter parameter, defined by 



Rv 



E{B - V) 



Ab 
Av 



1 



(52) 



(53) 



is a measure of the amount of extinction at the B band relative to that at the V band. The 
standard diffuse interstellar medium (ISM) value of Ry is 3.1; however, the value of Ry is known 
to vary with the type of interstellar environment. For example, Ry ~ 4 — 5 is typical of dense 
clouds. 
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The Cardelli, Clayton, &; Mathis (1989) extinction curve is given by: 

where x = (A/1 iJ,m)~^, and a{x) and b{x) are empirical expressions given by 

( 0.574a;i-6i (0.3 <x< 1.1) 

a(x) = < 1 + 0.176991/ - 0.50447?/2 - 0.02427?/3 + 0.72085y^ (55) 
[ +0.01979y'^ - 0.77530/ + 0.32999/ (1.1 <x< 3.3) 

and 

( -0.527x^-^^ (0.3 <x< 1.1) 

b{x) = I 1.41338?/ + 2.28305/ + 1.07233/ - 5.38434/ , (56) 

[ -0.62251/ + 5.30260/ - 2.09002/ (1.1 <x < 3.3) 

where y = x — 1.82. Cardelli, Clayton, and Mathis (1989) also determined expressions for a{x) 
and h{x) in the wavelength range 3.3 < x < 10 (1000 A < A < 3000 A); however, the Fitzpatrick 
&; Massa (1988) parameterization of the extinction curve, on which this portion of the Cardelli, 
Clayton, & Mathis (1989) parameterization of the extinction curve is largely based, is a more 
general description of the extinction curve in the UV. Consequently, we instead adopt the more 
general extinction curve model of Fitzpatrick & Massa (1988) at these UV wavelengths (see §3.2). 

The extinction curve at wavelengths A ^ 6000 A is generally attributed to absorption and 

scattering by classical Van dc Hulst grains (Van dc Hulst 1957). These are relatively large grains, 
with sizes of 1000 - 2000 A. They are thought to be fluffy, non-spherical composites containing 
carbon, silicates, oxides, and vacuum (Mathis 1996, 1998; Dwek 1998). Classical grain extinction 
saturates at a wavelength of A ~ 3000 A. 



3.2. 1000 A < A < 3000 A 

In a series of four papers, Massa & Fitzpatrick (1986) and Fitzpatrick & Massa (1986, 1988, 
1990) measured extinction curves for two samples of reddened Milky Way OB stars from lUE 
spectra. Their cluster sample consists of 35 stars from five clusters; since these stars were drawn 
from similar interstellar environments, their extinction curves are relatively similar. Their program 
sample consists of 45 stars from a wide variety of interstellar environments; consequently, the 
extinction curves of this sample are more varied. Fitzpatrick & Massa (1988, 1990) found that all 
80 extinction curves are well fitted by the following three-component function: 

E{\ - V) 

_ =ci + C2X + C3D{x; 7, xq) + C4F{x), (57) 

where ^ 

D{x;j,xo)= (58) 
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and 



I 0.5392(x - 5.9)2 _^ o.05644(x - 5.9)^ (a; > 5.9) 
= 1 (X < 5.9) • ^''^ 



The first component, ci + C2X, is linear and spans the wavelength range of the data: 1000 A 
< A < 3000 A. The second component, D(x;^,xo), is a functional form called the Drude profile; 
however, in this context, it is often called the 2175 A bump, or the UV bump. The third 
component, F{x), is an empirical expression called the FUV curvature component, or the FUV 
non- linear component. We depict all three of these components in Figure 1. 

Although the Fitzpatrick & Massa (1988) parameterization of the extinction curve is first 
and foremost an empirically-driven fitting function, it is not devoid of physical significance. For 
example, the Drude profile is the functional form of the absorption cross section of a forced-damped 
harmonic oscillator; it reduces to a Lorentzian near resonance (Jackson 1962). Fitzpatrick &: 
Massa (1986) found that the Drude profile better fits the data than does a pure Lorentzian; 
Lorentzian profiles had been used previously (Savage 1975; Seaton 1979). The Drude profile is a 
function of xq, the bump's center, and 7, the bump's width. 

The linear component of the extinction curve is generally attributed to a distribution of grain 
sizes; the larger grains, with sizes perhaps as large as the classical grains, are responsible for 
extinction in the near-UV, and the smaller grains, with sizes perhaps as small as 100 A or less, 
are responsible for extinction in the FUV. These grains have been interpreted either as the tail 
end of the classical grain population (e.g., Mathis, Rumpl, & Nordsicck 1977), or as a separate 
population altogether (e.g.. Hong & Greenberg 1980). The parameters ci and C2 are correlated 
(see Figure 2, §4.2); however, this correlation is merely an artifact of the fitting procedure by 
which the values of these parameters are determined (Carnochan 1986, see §4.2). 

The values of these parameters are known to vary with the type of interstellar environment. 
For example, in the diffuse ISM, the values of C2 are in the range 0.6-1, while in dense clouds, 
the values of C2 extend to lower values: 0-1 (e.g., Fitzpatrick & Massa 1988). This difference 
is generally attributed to the accretion of small grains onto larger grains, or to the coagulation 
of small grains into larger grains, both of which occur most readily in dense clouds (Scalo 1977; 
Cardelli, Clayton, & Mathis 1988, 1989; Mathis & WifFen 1989). In young star- forming regions, 
like the Orion Nebula, which are also dense clouds, C2 ~ (e.g., Fitzpatrick &; Massa 1988). This 
consistent, low value is generally attributed to stellar radiation forces, or to the evaporation of 
grains, both of which preferentially remove the smaller grains (McCall 1981; Cardelli &; Clayton 
1988). 

Dense clouds also protect grains from supernovae shocks, which preferentially destroy the 
larger, classical grains or their mantles, and thus possibly increase the number of grains responsible 
for the linear component of the extinction curve (Scab & ShuU 1983; Jcnniskens & Greenberg 
1993; Jones, Tielens, & Hollenbach 1996). Since extinction curves are normalized at the V band, 
the removal of classical grains alone guarantees higher values of C2 (e.g., Jenniskens &; Greenberg 
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1993). Indeed, in the Large and Small Magellanic Clouds (LMC and SMC), where old star- forming 
regions, like 30 Doradus (an extreme example), are more common than in the Galaxy, the values 
of C2 extend to higher values: 0.6 - 2 for the LMC and the SMC wing, and 2 - 2.5 for the SMC 
bar (Calzetti, Kinney, &: Storchi-Bergmann 1994; Gordon & Clayton 1998; Misselt, Clayton, & 
Gordon 1999). In starburst galaxies, the values of C2 are similarly high (Gordon, Calzetti, & Witt 
1997). 

There is less consensus about the type of grain that is responsible for the UV bump; however, 
the fact that for fixed values of the IR and optical extinction curve parameter Ry , and of the linear 
component parameters ci and C2, the strength of the UV bump can vary considerably, strongly 
suggests that the classical and linear component grains are not responsible for the UV bump (e.g., 
Greenberg & Chlewicki 1983). The UV bump is sometimes attributed to small graphitic grains, 
having diameters of ~ 200 A or less (e.g., Hecht 1986). One property of this model is that the 
bump's width, 7, can vary by a few tens of percent, while its center, xq, can vary by only a few 
percent; this is what is observed (e.g., Fitzpatrick &; Massa 1986). Another property of this model 
is that C3 and 7 are correlated, which is also observed (see Figure 4, §4.2); however, this correlation 
appears to change with the type of interstellar environment (see §4.2). 

Equations ( |57| ) and ( |58D show that the height, or strength, of the UV bump is proportional to 
€3/7^. Dense clouds and the diffuse ISM tend to favor strong UV bumps; however, star- forming 
regions, both young and old, tend to favor weak UV bumps. In fact, in the SMC bar and starburst 
galaxies, no UV bump is typically observed. Young star-forming regions tend to favor weak UV 
bumps probably because UV radiation destroys or alters the grains that are responsible for the 
UV bump (Jenniskens & Greenberg 1993); old star-forming regions tend to favor weak UV bumps 
probably because UV radiation and/or supernova shocks destroy or alter the grains that are 
responsible for the UV bump (Gordon, Calzetti, &: Witt 1997). These effects can be seen in Figure 
5 (see §4.2; see also Clayton, Gordon, & Wolff 2000), where C3 is an approximate measure of the 
strength of the UV bump. 

Even less is known about the type of grain that is responsible for the FUV non-linear 
component of the extinction curve. According to the model of Hecht (1986), the small graphitic 
grains that produce the UV bump should have a second plasmon resonance, resulting in a second 
and similar bump centered at a wavelength of A ~ 700 — 800 A. Indeed, the shape of the FUV 
non- linear component resembles the red wing of a Drude profile (Fitzpatrick & Massa 1988). 
Furthermore, this model suggests that the strengths of these two bumps might be correlated (e.g., 
Fitzpatrick &: Massa 1988); however, hydrogenation of these grains should largely decouple these 
resonances (e.g., Hecht 1986). Fitzpatrick & Massa (1988) and Jenniskens & Greenberg (1993) 
have shown that C3 and 7 are both weakly correlated with C4; however, inclusion of the LMC 
and SMC lines of sight suggests that these correlations also depend on the type of interstellar 
environment (see Figure 6, §4.2). As in the case of the UV bump parameters, C4 does not correlate 
with Ry, ci, or C2 (Jenniskens & Greenberg 1993). 
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Like the IR and optical extinction curve model of Cardelli, Clayton, & Mathis (1989; Equation 
54), the UV extinction curve model of Fitzpatrick & Massa (1988; Equation p^) can be written as 



a function of Ax /Ay, given the following rearrangement of the definition of Ry (Equation [5^ ): 

A, 1 EjX-V) 

To smoothly link these two models (see also Fitzpatrick 1999), we recommend the following 
weighted average of these models between the V band (A = 5500 A) and A = 3000 A: 

r Ax,ccM {x < 1.82) 

= < Ax,ccM + ^iM^{Ax,FM - Ax,ccm) (1-82 <x< 3.3) . (61) 
[ Ax,FM {x > 3.3) 

To summarize, to the limit of current observations, the IR through FUV extinction curve 
appears to be most generally modeled by eight parameters: Ay, Ry, ci, C2, £3/7^, C4, 7, and xq; 
the UV bump is more naturally parameterized by 03/7^ (bump height) and 7 (bump width) than 
by C3 and 7, since the former pair is orthogonal (Jenniskens & Greenberg 1993). In §3.3, we modify 
Equation (^) to include the effect at FUV wavelengths of absorption by H I in galaxies; this does 
not change the number of parameters. In §4, we show that the volume of the solution space of 
this additional, but necessary, eight-dimensional parameter space can be reduced significantly, a 
priori, with a prior; without such a prior. Equation (BlI) has very little predictive power. 





-V)ii 


Ay 




1 mag 





3.3. A < 1000 A 
The column density of H I in a galaxy along a line of sight is given by 

(62) 

1.5xlO-cm-M-ii^)(^) (^), (63) 
VI mag/ V3.1/ KvMwJ 

where 77 is the gas to dust ratio of the galaxy along the line of sight, and rjMW is the standard 
value of this ratio for the Milky Way. The bound-free photo-absorption cross section of ground 
state hydrogen as a function of wavelength is given by 

/ 7.9 X 10^18 cm2 f (A < 912 A) 

ax = < \9i2 aJ ^ ' ■ (64) 

l0cm2 (A > 912 A) 

Total extinction occurs when ax^n 3> 1, i.e., when 

..»8.xlO-(|K)(-A_)-(_^)-'.„,, <«5, 
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and A < 912 A. Since this condition is always satisfied along lines of sight through galaxies, at 
least into the soft X rays, we replace Equation (|6ll) with: 



Ax,ccM {x < 1.82) 

A\,CCM + ^7.48 ^ (^A,FM — ^A,CCA/) (1-82 < X < 3.3) 

Ax,FM (3.3 <x< 10.96) ■ ^ ^ 

oo (x > 10.96) 



Consequently, Equation (|66| ) describes how light is extinguished by dust and absorbed by H I 
along lines of sight through galaxies, in the rest frame, into the soft X rays. 



4. The Dust Extinction Curve Prior 

Using fitted extinction curves from Milky Way and Magellanic Cloud lines of sight, and 
the statistical methodology that we presented in §2, we now construct a prior that weights the 
eight-dimensional parameter space of the extinction curve model that we presented in §3 such that 
the volume of the solution space is reduced significantly, a priori. We describe the data set in §4.1. 
We model correlations between extinction curve parameters, and construct the prior, in §4.2. 



4.1. The Data Set 

From the literature, we have collected the results of fits to 166 extinction curves: we know the 
values of the UV extinction curve parameters ci, C2, C3, C4, 7, and xq for all of these lines of sight 
from fits to lUE spectra; we know the values of the IR and optical extinction curve parameter Ry 
for 79 of these lines of sight from IR and optical photometry. These lines of sight sample a wide 
variety of interstellar environments in the Milky Way, the LMC, and the SMC. We describe the 
breakdown of the data set in detail below; we summarize this information in Table 1. 

Ideally, we would fit the models of correlations between extinction curve parameters that 
we present in §4.2 directly to the lUE spectra and IR and optical photometry, instead of to the 
fitted values of the extinction curve parameters, which represent a compression of the information 
contained in the actual data. However, to do so would not be practicable. Consequently, we 
instead adopt the best-fit values of the extinction curve parameters, ci, C2, C3, C4, 7, xq, and Ry, 
and the fitted uncertainties in these parameters, when available, as the data set, at the expense of 
a minor loss of information. 

We have drawn the results of fits to Galactic extinction curves from two sources: the cluster 
and program samples of Fitzpatrick & Massa (1990) and the sample of Jenniskens & Greenberg 
(1993). The FM cluster sample consists of 35 fitted extinction curves, the FM program sample 
consists of 45 fitted extinction curves, and the JG sample consists of 115 fitted extinction curves. 
For information about how these samples were selected, and about how the selected, lUE spectra 
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were fitted, we refer the reader to these papers. There is some overlap between these samples: 3 
of the FM cluster sample extinction curves and 24 of the FM program sample extinction curves 
are also in the the JG sample. This lowers the number of Galactic extinction curves in our sample 
from 195 to 168. Of these extinction curves, Jenniskens &: Greenberg (1993) deemed 25 to be of 
low quality (see Jenniskens &; Greenberg 1993 for details). This lowers the number of Galactic 
extinction curves in our sample to 143. 

Of the 39 FM program sample lines of sight with high quality extinction curves, Cardelli, 
Clayton, & Mathis (1989) found values of Ry for 25 of these lines of sight from BVRIJHKL 
photometry. Of the 90 JG sample lines of sight with high quality extinction curves, Aiello et 
al. (1988) found values of Ry for 49 of these lines of sight from BVK photometry. Eight of these 
lines of sight are in common. This lowers the number of Galactic values of Ry from 74 to 66. 

From the fitted parameter values of the 21 high quality extinction curves that the FM and 
JG samples have in common, Jenniskens & Greenberg (1993) measured systematic and random 
errors between the two group's fitted values for each extinction curve parameter; we list these 
errors in Table 2. Furthermore, these systematic and random errors are comparable in size. In 
the interest of creating a uniform data set, primarily to facilitate identification and modeling of 
the correlations between these parameters, we have shifted each group's fitted parameter values 
by one half of the systematic difference between the two group's results; this brings both group's 
results into general agreement. We re-inject these systematic errors into the analysis in §4.2. 

Secondly, unlike the fitted parameter values of the LMC and SMC extinction curves, which we 
introduce below, uncertainties were not determined for each, or any, of the fitted parameter values 
of the Galactic extinction curves. Consequently, we adopt the above measured random errors as 
indicative of the uncertainties in the fitted parameter values of each of the Galactic extinction 
curves in our sample. Technically, since both groups fitted to the same lUE spectra, these random 
errors are lower limits. However, Massa Sz Fitzpatrick (1986) measured nearly identical upper 
limits from variations in the fitted parameter values of extinction curves measured along different 
lines of sight in the same OB associations; we list these errors in Table 2 also. Wc adopt the lower 
limits because this forces us to conservatively overestimate the extrinsic scatters in the fits of §4.2. 

To our Galactic sample, we have added the results of fits to 19 extinction curves, and 10 
corresponding values of Ry, from the LMC (Misselt, Clayton & Gordon 1999), and the results of 
fits to 4 extinction curves, and 3 corresponding values of Ry, from the SMC (Gordon & Clayton 
1998). This raises the number of extinction curves in our sample to 166, and the number of 
corresponding values of Ry to 79. Uncertainties have been determined for each of the fitted 
parameter values of the LMC and SMC extinction curves. 
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4.2. Correlations between Dust Extinction Curve Parameters 

We now model known correlations between the pairs of extinction curve parameters ci and 
C2, and Rv and C2. We also present possible correlations between the triplets of extinction 
curve parameters C3, 7, and C2; C4, C3, and C2; and C4, 7, and C2; however, we consider these 
three-parameter correlations to be too speculative to incorporate into our extinction curve prior. 
We also constrain the values of the extinction curve parameters 7 and Xq, which do not vary 
significantly across all 166 of the lines of sight in our data set (§4.1). Finally, we use this 
information to construct the extinction curve prior. 

We begin with a discussion of the extinction curve parameter C2, upon which all of the above 
correlations depend. In §3.2, we pointed out that different values of C2 are measured from different 
interstellar environments: low values of C2 are measured from young star-forming regions, low to 
moderate values of C2 are measured from dense clouds, moderate values of C2 are measured from 
the diffuse ISM, and moderate to high values of C2 are measured from old star-forming regions. 
This is not to say that C2 is a good measure of an interstellar environment's type: e.g., 30 Doradus 
is roughly ten times as active of a star-forming region as any region in the SMC (Caplan et al. 
1996), yet significantly higher values of C2 are measured from the SMC bar than from 30 Doradus; 
this is probably due to the SMC having lower density clouds, which are less able to protect the 
classical grains from UV radiation and supernovae shocks, than does the LMC (Misselt, Clayton, 
& Gordon 1999). However, C2 might be a reasonable, approximate measure of the net ability of 
an interstellar environment to affect extinguishing grains. Hence, the above correlations might be 
viewed as how the values of other extinction curve parameters, or the correlations between other 
extinction curve parameters, vary as a function of a single-parameter measure of net environmental 
conditions. 

To apply the statistical methodology that we presented in §2.2, (1) the models that we adopt 
to describe the above correlations must be slowly varying, i.e., the curve y = ydx; 9m) from §2.2.2 
and §2.2.3 must be varying from linearity only on scales that are larger than the scales given by 
the scatter of the data that wc presented in §4.1 about this curve; and (2) the density of these 
data along this curve also must be slowly varying; i.e., the intrinsic density along this curve, 
g{x,yc), and the selection function, f{x,y), must be varying from constancy only on scales that 
are larger than the scales given by this scatter of the data (§2.2.2). As we adopt only constant, 
linear, and slowly-varying quadratic models below, the first condition is met. As for the second 
condition, the density of the data varies most obviously with the value of C2. This is probably due 
to the selection function; e.g., diffuse ISM lines of sight, and consequently, values of C2 ~ 0.8, have 
been selected more often than lines of sight through any other type of interstellar environment, 
or value of C2. These density variations, however, also occur on scales that are larger than the 
scales given by the scatter of the data; consequently, the second condition also appears to be met. 
Hence, we appear to be within the realm of the formalism that we presented in §2.2, with but one 
caveat. The intrinsic scatters of some of the extinction parameters, namely ci and C2, are probably 
somewhat correlated (Fitzpatrick &; Massa 1988). This causes us to somewhat underestimate the 
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extrinsic scatters of these correlations below, but not significantly. 

We begin with the two-parameter correlations: ci and C2 (Figure 2), and Ry and C2 (Figure 
3). In both cases, the parameters are well correlated; however, both of these correlations are more 
mathematical than physical in nature. In the case of the former correlation, the linear component 
of Equation ( |57| ) is observed to pivot about a point at a wavelength of A « 3000 A as the slope, 
C2, of this component changes from one line of sight to another. If ci were measured at this 
wavelength, instead of at A = oo, this correlation would disappear; consequently, this correlation 
is merely an artifact of the fitting procedure by which the values of these parameters were 
determined (Carnochan 1986), and not due to any intrinsic physical property. However, the small 
size of the extrinsic scatter that we measure below for this correlation testifies to the constancy 
of the extinction curve at this pivot point. This is a physical property, since the wavelength of 
this pivot point differs from the V-band wavelength, A ~ 5500 A, at which the extinction curve is 
normalized. 

To the degree that the IR and optical extinction curve is really a one-parameter function of Ry-, 
and to the degree that the linear component of the UV extinction curve is really a one-parameter 
function of C2 (since ci and C2 are strongly correlated), Ry and C2 must be correlated if the 
extinction curve is to be continuous and differentiable between optical and UV wavelengths; 
consequently, this relation is also mathematical in nature. However, physical information can 
be gleamed from the value of which by Equation ( |53|) is a measure of the relative numbers 
and/or absorptivities of grains extinguishing in the B band to grains extinguishing in the V band. 
How this value changes as the value of C2 is changed provides physical information, perhaps as a 
function of environmental conditions, as we have discussed above. 

We now model these two correlations, and construct the prior, as described in §2.2.2 and 
§2.2.3. We do not include the Orion Nebula lines of sight in either of these fits, because nebular 
background contamination artificially lowers the measured values of ci along these lines of sight 
(Panek 1983), and similarly may affect the measured values of C2 (Fitzpatrick & Massa 1988); this 
effect has not been corrected for in these data (Fitzpatrick & Massa 1988). 

Given how the values of ci and C2 were determined, we model the first of these correlations 
with a function that is linear in C2'- 

ciic2) = b + m{c2- 02), (67) 

where 02 is the sample's median value of C2. The 1-a uncertainty in ci as a function of C2 is 
approximately given by 

<7ci(c2) 

where ah and am are the fitted 1-a uncertainties in the parameters b and m, and ac^ and ac2 are 
the fitted 1-a extrinsic scatters in the ci and C2 dimensions (§2.2.3). Assuming a fiat prior, we find: 



dci 
db 



'dm J 



+ o. 



'C2 



dc2) 



1/2 



- 23 - 

C2 = 0.711, b = -0.064, ab = 0.026, m = -3.275, am = 0.083, ac, = 0.088, and = 0.008. Using 
Equation (||) and these fitted values, we plot approximate 1-, 2-, and 3-cr confidence regions in 
Figure 2. Re-injection of the systematic errors between the FM and JG samples that we removed 
in §4.1 (Table 2) increases 0"^ to 0.176 and cxca to 0.037. 

The Rv-C2 correlation clearly is non-linear; however, it is slowly varying, so we model it with 
a function that is quadratic in C2. 

Rv{c2) = b + m(c2 - C2) + n(c2 - 02)^. (69) 

The 1-C7 uncertainty in Ry as a function of C2 is approximately given by 

0-i?^(c2) = 

where an is the fitted 1-a uncertainty in the parameter n, and a^y is the fitted 1-a extrinsic 
scatter in the Ry dimension. Assuming a flat prior, we find: 02 = 0.721, b = 3.228, ab = 0.053, 
m = -2.685, am = 0.159, n = 1.806, cj„ = 0.129, aR^ = 0.000, and = 0.142. Using Equation 
(|70|) and these fitted values, we plot approximate 1-, 2-, and 3-a confidence regions in Figure 3. 
Re-injection of the systematic errors between the FM and JG samples increases aRy to 0.112 and 
ac2 to 0.147. 

We now consider the three-parameter correlation C3, 7, and C2 (Figure 4). A strong correlation 
exists between C3 and 7 for Galactic lines of sight (c2 ~ 2/3) (Fitzpatrick & Massa 1988; Jenniskens 
& Greenberg 1993); however, inclusion of the Orion Nebula-like lines of sight (c2 ~ 0), and the 
LMC and SMC wing lines of sight (c2 ~ 4/3) ruins this previously-determined correlation; the 
SMC bar lines of sight (c2 ~ 7/3) are not constraining since 7 cannot be well determined when 
C3 ~ 0. These lines of sight request a shallower relation. Physically, this probably corresponds to 
the destruction or alteration of the UV bump grains by UV radiation and/or supernovae shocks 
(§3.2, Figure 5; see also Clayton, Gordon, & Wolff 2000). 

Weak correlations exist between C4 and C3, and C4 and 7 for Galactic lines of sight (Fitzpatrick 
& Massa 1988; Jenniskens & Greenberg 1993); indeed, weak positive correlations can be seen 
in Figure 6, if only the Galactic lines of sight are considered. However, inclusion of the LMC 
and SMC lines of sight also ruins these previously-determined correlations. The destruction or 
alteration of UV bump grains by the environment probably accounts for the shift to lower values 
of C3 in the top panel of Figure 6. Hydrogenation might account for the greater scatter in the 
bottom panel of Figure 6 (§3.2). 

In any case, we do not attempt to model and constrain the possible correlations in Figures 
4, 5, and 6. First of all, in distant galaxies, these grain species might occur in different relative 
abundances, perhaps due to different relative metallicities. Secondly, even if this is not the case, if 
the extinction is due primarily to dust that is local to the burst, the relative abundances of these 
grain species may be altered by the burst itself, as well as by the afterglow (Lamb & Reichart 



0-6 



dRy V 
db ) 



+ a. 



dRv \ 

'' dm 



dRv\ 



dRv\ 



2 '-/^ 



(70) 



-24- 



2000b). Consequently, no constraint can be placed between the extinction curve parameters C2, 
C3/72, and C4. 

The values of 7 and xq are approximately constant across all 166 lines of sight. We find 
their values to be 7 = 0.958 ± 0.088 and xq = 4.593 ± 0.020. Since the width of the UV 
bump is approximately equal to the widths of the photometric bands (Ai^/iy ~ 0.2), and since 
the uncertainty in the center of the UV bump is significantly smaller than the widths of the 
photometric bands {Au/u 0.004 <C 0.2), when fitting to afterglow photometry, precise values 
of these parameters cannot be extractable from the data; i.e., the fitted solutions should largely 
resemble the adopted prior, particularly in the case of the bump center parameter, xq. 

We now construct from these results an extinction curve prior, in accordance with the 
examples of §2.2.1. For the extinction curve parameters Ry, ci, C2, 7, and xq, we recommend that 
the following prior be used: 

p{Rv , ci , C2 , 7, xo |/) = G[Rv , Rv (c2 ) , Sctr^ (c2 )] Gfci , ci (c2 ) , Sa^ (02)] 

xG(7, 0.958, 0.264)G(a;o, 4.593, 0.060). (71) 

Here, we have conservatively tripled the 1-a uncertainties of the component priors, simply because 
these priors are determined solely from information that is local to our galaxy. For the extinction 
curve parameters Ay, £3/7^, and C4, we conservatively recommend that a flat prior be used. 
Altogether, this prior weights the eight-dimensional parameter space of the extinction curve model 
that we presented in §3 such that the volume of the solution space is reduced significantly, a priori. 

Finally, we comment on the possibility of an evolving extinction curve. As mentioned above, 
if an afterglow is extinguished by dust that is local to a burst, energetic photons, both from the 
burst and from the afterglow, may alter the extinction curve with time (Lamb & Reichart 2000b). 
However, since all afterglows observed to date have faded more rapidly than F,^ ~ t^^ at optical 
through X-ray wavelengths, the majority of these energetic photons are probably emitted during 
the first few seconds or minutes of the afterglow, if not during the burst itself. Hence, any dust 
destruction or alteration that may occur, should occur on such timescales. By restricting oneself to 
photometry taken hours or longer after a burst, one should be able to safely ignore the possibility 
of an evolving extinction curve. 

5. The Lya Forest Flux Deficit Model and Prior 

At redshifts of z ^ 2, the Lya forest will absorb light at optical wavelengths, and consequently 
cannot be ignored (Fruchter 1999a; Lamb & Reichart 2000a). We present a two-parameter model 
that describes the effects of the Lya forest on the spectral fiux distributions of afterglows (and on 
the spectral flux distributions of all extragalactic point sources for that matter). Using Lya forest 
flux deflcit measurements from quasar absorption line systems, we construct a prior that weights 
this two-dimensional space such that the volume of the solution space is reduced significantly. 
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In the study of quasar absorption line systems, the quantity called flux deficit, denoted Da, 
is defined by 

F,^ (observed) \ 



Da 



1 



(72) 



-Fjy (continuum) / ' 

where this quantity is averaged over the wavelength range between the emission lines Lya and 
Ly/3 + O VI that is not affected by emission line wings, and only if the continuum can be reliably 
extrapolated from the unabsorbed spectrum at longer wavelengths (Oke & Korycansky 1982). Zuo 
& Phinney (1993), Zuo (1993), and Lu & Zuo (1994) model this quantity by 



Da{z) « 1 — exp 



1 + z 
l + z 



(73) 



where a and b are parameters whose values are determined by fitting to flux deficit measurements, 
z is the redshift corresponding to the central wavelength of the range over which the quantity 
Da is averaged, and z is the median value of z for the sample to which one is fltting. The l-a 
uncertainty in Da as a function of z is approximately given by 
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where aa and at are the fltted 1-a uncertainties in the parameters a and b, and cjda are 
the fitted 1-a extrinsic scatters in the Da and z dimensions (§2.2.3). 

Assuming a fiat prior, and adopting Sample 4 of Zuo & Lu (1993), which is a combination 
of Sample 2 of Zuo & Lu (1993) (see Zuo & Lu 1993 for details) and the high redshift {z ~ 4) 
sample of Schneider, Schmidt, & Gunn (1989a,b, 1991), we find: z = 2.994, a = 0.306, aa = 0.010, 
b = 4.854, ab = 0.188, an^ = 0-000, and a^ = 0.165. We plot Sample 4 of Zuo and Lu (1993) 



and, using Equation (|74D and these fitted values, approximate 1-, 2-, and 3-a confidence regions 
in Figure 7. The extent of the scatter about the best fit in Figure 7 is largely a reflection of the 
extent of the wavelength range over which these values of Da were averaged. This wavelength 
range corresponds to Av/v ~ 0.2, which is typical of the photometric bands. In other words, the 
scatter in Figure 7, very conveniently, is typical of what one would find if Lya forest fiux deficits 
were measured from afterglows photometrically. Consequently, the Lya forest flux deficit prior for 
photometric, as opposed to spectroscopic, data is given by (§2.2.1) 



p{Da,z\I) = G[DA,DAiz),aDAz)]- 



(75) 



6. Example Extinguished and Absorbed Spectral Flux Distributions 

We now demonstrate the breath of the models of §3 and §5. Using these models, we plot in 
Figure 8 example spectral fiux distributions that have been extinguished by dust in a host galaxy, 
absorbed by H I in the host galaxy, redshifted, and absorbed by the Lya forest, for a wide variety 
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of plausible extinction curves and redshifts. We have adopted an intrinsic spectrum of Fy oc u , 
and we allow the values of Ay, C2, 03/7^, C4, and z to vary over observed/reasonable ranges. The 
values of Ry, ci, 7, xq, and D^^ we take from the best fits of §4.2 and §5. Finally, we convolve 
each extinguished, absorbed, and redshifted spectrum with a logarithmically flat smearing function 
of width Az^ = 0.2z^, converting each spectrum to a spectral flux distribution; i.e., we model how 
these spectra would be sampled photometrically, as opposed to spectroscopically (§5). Clearly, a 
single intrinsic spectrum can manifest itself in a multitude of ways, and exhibit a variety of broad 
spectral features, including a shoulder in the infrared, the UV bump, the Lya forest, and the 
Lyman limit. 

7. Conclusions 

In this paper, we have presented a very general, Bayesian inference formalism with which 
afterglow models can be tested, and with which the parameter values of acceptable models can be 
constrained. Furthermore, we have developed and presented a formalism for the construction of 
Bayesian prior probability distributions from multi-dimensional data sets, which we have drawn on 
extensively. We have presented models that describe how extinction by dust, both in host galaxies 
and in our galaxy, and absorption by the Lya forest and by H I in host galaxies, change the 
intrinsic spectra of afterglows. Then, applying the above formalism, we constructed a prior that 
weights the additional, but necessary, parameter space of these models such that the volume of 
the solution space is reduced significantly, a priori. These models and priors will lead to the more 
realistic modeling of afterglows, particularly at IR through UV wavelengths, in future papers. 

Finally, we emphasize that the phenomena for which we have presented models and priors 
in this paper - extinction by dust and absorption by the Lya forest - affect identically the light 
from all other extragalactic point sources.^ Consequently, the work presented in this paper is as 
applicable to high-redshift Type la supernovae and quasars, for example, as it is to the afterglows 
of bursts. Since the effects of extinction and absorption are most dramatic at UV wavelengths in 
the source frame, these models and priors will be particularly useful for the modeling of optical 



^By point source, we mean either that the host galaxy contributes a negligible fraction of the total light within 
the point spread function of the point source, or that this contribution of the host galaxy to the total light can be 
measured directly - which can be done in the case of a fading point source after it fades away - and consequently 
separated from that of the point source. Otherwise, one must model not a single point source in a distribution of 
dust, but instead a distribution of point sources in a distribution of dust, which is a significantly more challenging 
endeavor, but certainly not impossible (e.g., Witt, Thronson, & Capuano 1992; Gordon, Calzetti, & Witt 1997). 
Similarly, light from a point source that is either scattered or absorbed and thermally re-emitted into the line of sight 
can contribute non-negligibly to, and even dominate the direct light of a fading point source at late times, due to the 
time delay with which the indirect light is received (e.g., Reichart 2000). Again, one must either use data from early 
times, when the contribution of the "dust echo" is negligible, or measure the contribution of the dust echo at late 
times, when the contribution of the fading point source is negligible, and use this information to properly interpret 
the data at intermediate times, when neither component can be ignored. 
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photometry of high-redshift point sources. 
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Table 1. Breakdown of the Extinction Curve Data Set 



Galaxy 


Extinction Curve Sample* 


Number of Extinction Curves 


Number of Values of Rv 


Rv Reference* 


MW 


FM Cluster 


35 









FM Program 


45 


25 


CCM 




JG 


115 


49 


Aea 




Combined'' 


143 


66 




LMC 


MCG 


19 


10 


MCG 


SMC 


GC 


4 


3 


GC 


Combined 


Combined 


166 


79 





*FM - Fitzpatrick & Massa 1990; JG - Jenniskens & Greenberg 1993; MCG - Misselt, Clayton, & Gordon 1999; GC 
- Gordon & Clayton 1998; CCM - Cardelli, Clayton, & Mathis 1989; Aea - Aiello et al. 1988. 

''Overlap between the samples, and low quality data have been removed (see §4.1 for details). 
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Table 2. Systematic and Random Errors Between the FM and JG Samples 



Parameter 


Systematic Error"''' 


Random Error Lower Limit'' 


Random Error Upper Limit*^ 


Cl 


0.304 


0.259 




C2 


-0.073 


0.050 


0.08 


C3 


0.316 


0.252 


0.27 


C4 


0.082 


0.063 


0.08 


7 


0.036 


0.039 


0.04 


Xo 


-0.013 


0.010 


0.01 


Rv 


-0.224 


0.205 





^JG - FM 

''Based on the 20 non-Orion Nebula, high quality extinction curves that the FM and JG samples 
have in common (see §4.1 for details). 

"^Based on variations along different lines of sight in the same OB associations (see §4.1 for 
details). From Table 2 of Jenniskens & Greenberg 1993. 
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Fig. 1. — An extinction curve that is typical of the diffuse ISM of our galaxy. The dotted lines 
mark the three components of the UV extinction curve of Fitzpatrick & Massa (1988) (see §3.2). 

Fig. 2. — The correlation between the extinction curve parameters ci and C2. The dotted lines 
mark approximate 1-, 2- and 3-a confidence regions (see §4.2). The filled squares are from the 
Fitzpatrick Sz Massa (1988; FM) cluster sample, the filled circles are from the FM program sample, 
the empty triangles are from the Jenniskens & Greenberg (1993; JG) sample, the empty squares 
are from both the FM cluster sample and the JG sample, the empty circles are from both the FM 
program sample and the JG sample, the solid error bars denote the Misselt, Clayton, & Gordon 
(1999) LMC sample, and the dotted error bars denote the Gordon & Clayton (1998) SMC sample 
(see §4.1). The error bars of the Galactic points are discussed in §4.1. The encircled points denote 
lines of sight through the Orion Nebula region. 

Fig. 3. — The correlation between the extinction curve parameters Ry and C2- The dotted lines 
mark approximate 1-, 2- and 3-a confidence regions (see §4.2, Figure 2). 

Fig. 4. — The correlation between the extinction curve parameters C3 and 7 as a function of C2: 
the Orion Nebula region lines of sight have C2 ~ 0, the Galactic lines of sight have C2 ~ 2/3, the 
LMC and SMC wing lines of sight have C2 ~ 4/3, and the SMC bar lines of sight have C2 ~ 7/3 
(see §4.2, Figure 2). 

Fig. 5. — How the strength of the UV bump, as measured by C3, varies with environmental 
conditions, as measured by C2 (see §3.2, §4.2, Figure 2). 

Fig. 6. — The correlation between the extinction curve parameters C4 and C3 (top panel), and C4 
and 7 (bottom panel) as a function of C2 (see §4.2, Figure 2, Figure 4). 

Fig. 7. — The correlation between Lya forest flux deficit, Da, and redshift, z. The dotted lines 
mark approximate 1-, 2- an 3-a confidence regions (see §5). The circles denote Sample 2 of Zuo 
Sz Lu (1993), and the squares denote the high redshift sample of Schneider, Schmidt, &; Gunn 
(1989a,b, 1991). 

Fig. 8. — Example extinguished and absorbed spectral flux distributions (see §6). The dotted 
curve in each panel corresponds to an unextinguished, unabsorbed spectral flux distribution, given 
by F,y (X The top solid curve in each of the first five panels is given by (Ay, C2, 03/7^, C4, z) 

= (1,0,0,0,1). The lower two solid curves in each of these five panels is given by increasing, as 
marked, the value of a single of these five parameters. In the fifth (redshift) panel, we have fixed 
the spectral flux at long wavelengths. The solid curves in the sixth panel are typical of extinction 
by dust in (from top to bottom) the Orion Nebula, the diffuse ISM of our galaxy, the LMC and 
the SMC wing, and the SMC bar, for a variety of redshifts. 
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